Exploring Keyphrase Extraction and IPC Classification Vectors for Prior Art Search
نویسندگان
چکیده
In this paper we describe experiments conducted for CLEFIP 2011 Prior Art Retrieval track. We examined the impact of 1) using key phrase extraction to generate queries from input patent and 2) the use of citation network and (International Patent Classification) IPC class vector in ranking patents. Variations of a popular key phrase extraction technique were explored for extracting and scoring terms of query patent. These terms are used as queries to retrieve similar patents. In the second approach, we use a two stage retrieval model to find similar patents. Each patent is represented as an IPC class vector. Citation network of patents is used to propagate these vectors from a node (patent) to its neighbors (cited patents). Similar patents are found by comparing query vector with vectors of patents in the corpus. Text based search is used to re-rank this solution set to improve precision. Two-stage system is used to retrieve and rank patents. Finally, we also extract and add citations present within the text of a query patent to the result set. Adding these citations (present in query patent text) to the results shows significant improvement in Mean Average Precision (MAP).
منابع مشابه
State of the Art of Automatic Keyphrase Extraction Methods (État de l'art des méthodes d'extraction automatique de termes-clés) [in French]
State of the Art of Automatic Keyphrase Extraction Methods This article presents the state of the art of the automatic keyphrase extraction methods. The aim of the automatic keyphrase extraction task is to extract the most representative terms of a document. Automatic keyphrase extraction methods can be divided into two categories : supervised methods and unsupervised methods. For supervised me...
متن کاملCLEF-IP 2011 Working Notes: Utilizing Prior Art Candidate Search Results for Refined IPC Classification
For the refined IPC classification in the CLEF-IP 2011 task, we constructed classification system with KNN classification which uses PAC (Prior Art Candidate) search results as neighbors. We also slightly modified the neighborhood evaluation. We also furnished a simple PAC search system. We produced some running results both in PAC search and classification, and evaluated our system. Our test s...
متن کاملLocal Word Vectors Guiding Keyphrase Extraction
Automated keyphrase extraction is a fundamental textual information processing task concerned with the selection of representative phrases from a document that summarize its content. This work presents a novel unsupervised method for keyphrase extraction, whose main innovation is the use of local word embeddings (in particular GloVe vectors), i.e. embeddings trained from the single document und...
متن کاملExperiments with Citation Mining and Key-Term Extraction for Prior Art Search
This technical note presents the system built for the IP track of CLEF 2010 based on PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS), the modular search infrastructure initially realized for CLEF IP 2009. We largely reused the system of the previous CLEF IP but at a relatively smaller scale and with the improvement of three main components: • A new citation mining tool based on C...
متن کاملCorpus-independent Generic Keyphrase Extraction Using Word Embedding Vectors
Keyphrase extraction from a given document is a difficult task that requires not only local statistical information but also extensive background knowledge. In this paper, we propose a graph-based ranking approach that uses information supplied by word embedding vectors as the background knowledge. We first introduce a weighting scheme that computes informativeness and phraseness scores of word...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011